AITopics | beta 0

Collaborating Authors

beta 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reviews: Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function

Neural Information Processing SystemsJan-26-2025, 03:17:37 GMT

The submission studies the adversarial online learning in episodic loop-free Markov decision processes. The importance of this work is that it is the first to provide the understanding to an adversarial online learning problem where the transition function is unknown, the loss functions are changing, and each feedback is bandit. The related work clearly describe the line of this research field from fixing an unknown transition and an unknown loss function to the setting studied in this submission. Although the MDPs considered in the submission is L-layered and loop-free, the results and the analysis pave the way for general MDPs. The main idea is the design of the confidence sets to include the optimal occupancy measure which induces the optimal policy.

feedback and unknown transition function, online stochastic shortest path, submission, (8 more...)

Neural Information Processing Systems

Industry: Education (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Understanding overfitting in random forest for probability estimation: a visualization and simulation study

Barreñada, Lasai, Dhiman, Paula, Timmerman, Dirk, Boulesteix, Anne-Laure, Van Calster, Ben

arXiv.org Artificial IntelligenceSep-30-2024

Random forests have become popular for clinical risk prediction modelling. In a case study on predicting ovarian malignancy, we observed training c-statistics close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behaviour of random forests by (1) visualizing data space in three real world case studies and (2) a simulation study. For the case studies, risk estimates were visualised using heatmaps in a 2-dimensional subspace. The simulation study included 48 logistic data generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true c-statistic and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 were simulated and RF models trained with minimum node size 2 or 20 using ranger package, resulting in 192 scenarios in total. The visualizations suggested that the model learned spikes of probability around events in the training set. A cluster of events created a bigger peak, isolated events local peaks. In the simulation study, median training c-statistics were between 0.97 and 1 unless there were 4 or 16 binary predictors with minimum node size 20. Median test c-statistics were higher with higher events per variable, higher minimum node size, and binary predictors. Median training slopes were always above 1, and were not correlated with median test slopes across scenarios (correlation -0.11). Median test slopes were higher with higher true c-statistic, higher minimum node size, and higher sample size. Random forests learn local probability peaks that often yield near perfect training c-statistics without strongly affecting c-statistics on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models.

predictor, probability, probability estimation, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1186/s41512-024-00177-1

2402.18612

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

On Optimizing Hyperparameters for Quantum Neural Networks

Herbst, Sabrina, De Maio, Vincenzo, Brandic, Ivona

arXiv.org Artificial IntelligenceMar-27-2024

The increasing capabilities of Machine Learning (ML) models go hand in hand with an immense amount of data and computational power required for training. Therefore, training is usually outsourced into HPC facilities, where we have started to experience limits in scaling conventional HPC hardware, as theorized by Moore's law. Despite heavy parallelization and optimization efforts, current state-of-the-art ML models require weeks for training, which is associated with an enormous $CO_2$ footprint. Quantum Computing, and specifically Quantum Machine Learning (QML), can offer significant theoretical speed-ups and enhanced expressive power. However, training QML models requires tuning various hyperparameters, which is a nontrivial task and suboptimal choices can highly affect the trainability and performance of the models. In this study, we identify the most impactful hyperparameters and collect data about the performance of QML models. We compare different configurations and provide researchers with performance data and concrete suggestions for hyperparameter selection.

configuration, dataset, experiment, (17 more...)

arXiv.org Artificial Intelligence

2403.18579

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Class-wise and reduced calibration methods

Panchenko, Michael, Benmerzoug, Anes, Delgado, Miguel de Benito

arXiv.org Artificial IntelligenceOct-7-2022

For many applications of probabilistic classifiers it is important that the predicted confidence vectors reflect true probabilities (one says that the classifier is calibrated). It has been shown that common models fail to satisfy this property, making reliable methods for measuring and improving calibration important tools. Unfortunately, obtaining these is far from trivial for problems with many classes. We propose two techniques that can be used in tandem. First, a reduced calibration method transforms the original problem into a simpler one. We prove for several notions of calibration that solving the reduced problem minimizes the corresponding notion of miscalibration in the full problem, allowing the use of non-parametric recalibration methods that fail in higher dimensions. Second, we propose class-wise calibration methods, based on intuition building on a phenomenon called neural collapse and the observation that most of the accurate classifiers found in practice can be thought of as a union of K different functions which can be recalibrated separately, one for each class. These typically out-perform their non class-wise counterparts, especially for classifiers trained on imbalanced data sets. Applying the two methods together results in class-wise reduced calibration algorithms, which are powerful tools for reducing the prediction and per-class calibration errors. We demonstrate our methods on real and synthetic datasets and release all code as open source at https://github.com/appliedAI-Initiative

calibration, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.03702

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Africa > Middle East > Tunisia > Tunis Governorate > Tunis (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

Cost Function

#artificialintelligenceMar-27-2021, 05:35:33 GMT

While dealing with Linear Regression we can have multiple lines for different values of slopes and intercepts. But the main question that arises is which of those lines actually represents the right relationship between the X and Y and in order to find that we can use the Mean Squared Error or MSE as the parameter. For linear regression, this MSE is nothing but the Cost Function. Mean Squared Error is the sum of the squared differences between the prediction and true value. And the output is a single number representing the cost. So the line with the minimum cost function or MSE represents the relationship between X and Y in the best possible manner.

beta 0, cost function, mse, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.57)

Add feedback

Predictive Information Accelerates Learning in RL

Lee, Kuang-Huei, Fischer, Ian, Liu, Anthony, Guo, Yijie, Lee, Honglak, Canny, John, Guadarrama, Sergio

arXiv.org Artificial IntelligenceJul-24-2020

The Predictive Information is the mutual information between the past and the future, I(X_past; X_future). We hypothesize that capturing the predictive information is useful in RL, since the ability to model what will happen next is necessary for success on many tasks. To test our hypothesis, we train Soft Actor-Critic (SAC) agents from pixels with an auxiliary task that learns a compressed representation of the predictive information of the RL environment dynamics using a contrastive version of the Conditional Entropy Bottleneck (CEB) objective. We refer to these as Predictive Information SAC (PI-SAC) agents. We show that PI-SAC agents can substantially improve sample efficiency over challenging baselines on tasks from the DM Control suite of continuous control environments. We evaluate PI-SAC agents by comparing against uncompressed PI-SAC agents, other compressed and uncompressed agents, and SAC agents directly trained from pixels.

environment step, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2007.12401

Country:

North America > United States > Michigan (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

A Simulation Model Demonstrating the Impact of Social Aspects on Social Internet of Things

Zia, Kashif

arXiv.org Artificial IntelligenceFeb-23-2020

In addition to seamless connectivity and smartness, the objects in the Internet of Things (IoT) are expected to have the social capabilities -- these objects are termed as ``social objects''. In this paper, an intuitive paradigm of social interactions between these objects are argued and modeled. The impact of social behavior on the interaction pattern of social objects is studied taking Peer-to-Peer (P2P) resource sharing as an example application. The model proposed in this paper studies the implications of competitive vs. cooperative social paradigm, while peers attempt to attain the shared resources / services. The simulation results divulge that the social capabilities of the peers impart a significant increase in the quality of interactions between social objects. Through an agent-based simulation study, it is proved that cooperative strategy is more efficient than competitive strategy. Moreover, cooperation with an underpinning on real-life networking structure and mobility does not negatively impact the efficiency of the system at all; rather it helps.

agent, mobility, small world, (15 more...)

arXiv.org Artificial Intelligence

2002.11507

Country: Asia > Middle East > Oman > Al Batinah North Governorate > Sohar (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Smart Houses & Appliances (0.63)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Internet of Things (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Linear Regression with Gradient Descent from Scratch in Numpy

#artificialintelligenceOct-17-2019, 16:18:32 GMT

I strongly advise you to read the article linked above. It will set the foundations on the topic, plus some math is already discussed there. To start out, I'll define my dataset -- only three points that are in a linear relationship. I've chosen so few points only because the math will be shorter -- needless to say, the math won't be more complex for longer dataset, it would just be longer, and I don't want to make some stupid arithmetic mistake. Then I'll set coefficients beta 0 and beta 1 to some constant and define the cost function as Sum of Squared Residuals (SSR/SSE).

cost function, gradient descent, linear regression, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Add feedback

Gradient Descent Demystified in 5 Minutes

#artificialintelligenceOct-13-2019, 17:43:31 GMT

The algorithm starts off with setting initial values for coefficients -- you are free to set the values to whatever you like (just not a string or boolean), but the common practice is to set them to 0. If I have two coefficients, let's say beta 0 and beta 1, I would set them to zero initially: Now just to keep things simple let's say I'm dealing with a linear regression task, and those betas are my coefficients (beta 0 being the bias intercept). It's quite simple to read. You make a prediction, then subtract that prediction from the actual value, and you take the square of that. Now comes the part where you should know a bit of Calculus to fully understand what's going on. You need to calculate partial derivatives for each of the coefficients, so the coefficients can be updated later. Some time ago I've written an article on taking derivatives in Python, and it covers to a degree those topics: As my model has only two coefficients, I need to calculate two partial derivatives, one with respect to beta 0, and the other with respect to beta 1. Here's how: Now comes the part in which you take those two functions and do something known as epoch -- just a fancy word for a single iteration through the dataset.

coefficient, dataset, gradient descent demystified, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Non-ergodic Convergence Analysis of Heavy-Ball Algorithms

Sun, Tao, Yin, Penghang, Li, Dongsheng, Huang, Chun, Guan, Lei, Jiang, Hao

arXiv.org Machine LearningNov-9-2018

In this paper, we revisit the convergence of the Heavy-ball method, and present improved convergence complexity results in the convex setting. We provide the first non-ergodic O(1/k) rate result of the Heavy-ball algorithm with constant step size for coercive objective functions. For objective functions satisfying a relaxed strongly convex condition, the linear convergence is established under weaker assumptions on the step size and inertial parameter than made in the existing literature. We extend our results to multi-block version of the algorithm with both the cyclic and stochastic update rules. In addition, our results can also be extended to decentralized optimization, where the ergodic analysis is not applicable.

artificial intelligence, beta 0, machine learning, (14 more...)

arXiv.org Machine Learning

1811.01777

Country:

Asia > China (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.55)

Industry: Leisure & Entertainment > Sports > Tennis (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Add feedback